# Reinforcement Learning Tuning
Mimo 7B RL
MIT
MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.
Large Language Model
Transformers

M
XiaomiMiMo
11.79k
252
Meta Llama 3 70B Fp8
Other
Meta Llama 3 70B is a large language model developed by Meta, featuring 70 billion parameters and supporting an 8k context length, designed for English-language business and research applications.
Large Language Model
Transformers English

M
FriendliAI
34
5
Meta Llama 3 8B Instruct GGUF
This is the GGUF quantized version of the 8 billion parameter instruction-tuned model from the Meta Llama 3 series, optimized for dialogue scenarios and demonstrating excellent performance in multiple benchmark tests.
Large Language Model English
M
MaziyarPanahi
293.90k
88
Ppo LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
P
araffin
65
18
Featured Recommended AI Models